Study of semantic relatedness of words using collaboratively constructed semantic resources

نویسنده

  • Torsten Zesch
چکیده

Computing the semantic relatedness between words is a pervasive task in natural language processing with applications e.g. in word sense disambiguation, semantic information retrieval, or information extraction. Semantic relatedness measures typically use linguistic knowledge resources like WordNet whose construction is very expensive and time-consuming. So far, insufficient coverage of these linguistic resources has been a major impediment for using semantic relatedness measures in large-scale natural language processing applications. However, the World Wide Web is currently undergoing a major change as more and more people are actively contributing to new resources available in the so called Web 2.0. Some of these rapidly growing collaboratively constructed resources like Wikipedia and Wiktionary have the potential to be used as a new kind of semantic resource due to their increasing size and significant coverage of past and current developments. In this thesis, we present a comprehensive study aimed at computing semantic relatedness of word pairs using such collaboratively constructed semantic resources. We analyze the properties of the emerging collaboratively constructed semantic resources Wikipedia and Wiktionary and compare them to classical linguistically constructed semantic resources like WordNet and GermaNet. We show that collaboratively constructed semantic resources significantly differ from linguistically constructed semantic resources, and argue why this constitutes both an asset and an impediment for research in natural language processing. For handling the growing number of available semantic resources, we propose a representational interoperability framework that is used to represent and access all semantic resources in a uniform manner. We give a detailed overview of the state of the art in computing semantic relatedness and categorize semantic relatedness measures into four types according to their working principles and the properties of the semantic resources they use. We investigate how existing semantic relatedness measures can be adapted to collaboratively constructed semantic resources bridging the observed differences in semantic resources. For that purpose, we perform a graph-theoretic analysis of semantic resources to prove that semantic relatedness measures working on graphs can be correctly adapted. For the first time, we generalize a state-of-the-art vector based semantic relatedness measure to each semantic resource where we can retrieve or construct a textual description for each concept. This generalized semantic relatedness measure turns out to be the most versatile measure being easily applicable to all semantic resources. For the first time, we show (on the example of the German Wikipedia) that the growth of a resource has no or little negative effect on the performance of semantic relatedness measures, but that the coverage steadily increases. We intrinsically evaluate the adapted semantic relatedness measures on two tasks: (i) comparison with human judgments, and (ii) solving word choice problems. Additionally, we extrinsically evaluate semantic relatedness measures on the task of keyphrase extraction, and propose a new approach to keyphrase extraction based on semantic relatedness measures with the goal to find infrequently used words in a document that are semantically connected to many other words in the document. For the purpose of evaluating keyphrase extraction, we developed a new evaluation strat-

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Presentation of an efficient automatic short answer grading model based on combination of pseudo relevance feedback and semantic relatedness measures

Automatic short answer grading (ASAG) is the automated process of assessing answers based on natural language using computation methods and machine learning algorithms. Development of large-scale smart education systems on one hand and the importance of assessment as a key factor in the learning process and its confronted challenges, on the other hand, have significantly increased the need for ...

متن کامل

Presentation of an efficient automatic short answer grading model based on combination of pseudo relevance feedback and semantic relatedness measures

Automatic short answer grading (ASAG) is the automated process of assessing answers based on natural language using computation methods and machine learning algorithms. Development of large-scale smart education systems on one hand and the importance of assessment as a key factor in the learning process and its confronted challenges, on the other hand, have significantly increased the need for ...

متن کامل

Wisdom of crowds versus wisdom of linguists - measuring the semantic relatedness of words

In this article, we present a comprehensive study aimed at computing semantic relatedness of word pairs. We analyze the performance of a large number of semantic relatedness measures proposed in the literature with respect to different experimental conditions, such as (i) the datasets employed, (ii) the language (English or German), (iii) the underlying knowledge source, and (iv) the evaluation...

متن کامل

Rapid Development Process of Spoken Dialogue Systems using Collaboratively Constructed Semantic Resources

We herein propose a method for the rapid development of a spoken dialogue system based on collaboratively constructed semantic resources and compare the proposed method with a conventional method that is based on a relational database. Previous development frameworks of spoken dialogue systems, which presuppose a relational database management system as a background application, require complex...

متن کامل

Framework for the Development of Spoken Dialogue System based on Collaboratively Constructed Semantic Resources

We herein introduce our project of realizing a framework for the development of a spoken dialogue system based on collaboratively constructed semantic resources. We demonstrate that a semantic Web-oriented approach based on collaboratively constructed semantic resources significantly reduces troublesome rule descriptions and complex configurations, which are caused by the previous relational da...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010